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ABSTRACT 

"ANALYSIS OF POLICY APPLICATION OF EXPERIMENTAL RESULTS", 

Charles M. Achilles, E. Michigan Univ. 

Jayne B. Zaharias and Barbara A. Nye, COE, TN State Univ. 

In 1989-1990 Tennessee leaders funded 17 school districts to apply results of STAR, a 
longitudinal experiment causally linking class size and student achievement. Researchers have 
studied Project Challenge (1989-1995) by analyzing the statewide rankings of the 17 (now 16) 
participating systems. 

Reduced class sizes (1:15) have shown positive results in Challenge counties (n=17) as 
shown by mean ranks on pupil scores in Reading and Math of the Tennessee Comprehensive 
Assessment Program (TCAP) at grade two of: 99 (Reading) 85 (Math) in 1992, to 79 (Reading) 
57 (Math) in 1993. Tennessee has 138 systems, so a rank of 69 is average. Challenge systems 
(collectively) wen. below average in 1990; by 1^3 they were above average in math and 20 ranks 
closer to average in reading. Since by 1993 students in grade two would have had all three years 
of treatment (1:15), one would not expect major gains in later years. That was substantiated as the 
average ranks for reading and math remained fairly constant, 1993, 1994 and 1995. 

After finding viitually identical results using Challenge and Tennessee Value Added 
Assessment System (TVAAS) analyses of Challenge, researchers suggested using the TVAAS 
database to evaluate Challenge as it will offer options for expanded analyses. 

Class sizes of about 1:15 in Challenge systems accompanied achievement results in reading 
and math that paralleled those predicted from the STAR experiment This application of research 
results seems justified. The TVAAS database offers a reasonable way to monitor Challenge- 
system progress. 

The paper also contains a fairly detailed Bibliogaphy about Project STAR and other related 
class-size studies in addition to the References. 
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Analysis of Policy Application of Experimental Results: 
Project Challenge 



Introduction 



Some Background and Context 

In 1985 legislators and policy persons in Tennessee (TN) planned to set class-size policy for 
early elementary grades. Before doing ±at they reviewed the extant? literature and found that ±ere 
were no definitive answers about class size and pupil outcomes. They passed legislation and 
appropriated funds for what became known as Project STAR (Student-Teacher Achievement 
Ratio). The mandate for STAR was to determine the effects of small classes (a ratio of about 1:15) 
on pupil achievement (test results) and development in early primary grades (K-3). 

Project STAR was a statewide, longimdinal experiment employing strict controls, random 
assignments of pupils and teachers, two treatments (1:15 and 1:25 with a full-time teacher aide) 
and a control condition (1:25) using an in- school design. There were about 100 classes of each 
condition during each of the study's four years. Students entered in K (1985-1986) and remained 
in their assigned class-type for grades K-3 (1985-1986 to 1988-1989). Random replacement was 
used if students moved or entered STAR Schools. Students took the Standard Achievement Tests 
(SAT) as the Norm Referenced Test (NRT) and Tennessee's Basic Skills First (BSF) as the 
Criterion- Referenced Test (CRT) geared to the objectives of the TN curriculum. Researchers 
collected much data on pupils, teachers, principals, schools, districts, etc. Al±ough STAR began 
with about 7100 pupils, by the end— due to mobility— about 10,000 pupils were included in the 
database. 
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The TN legislature and State Department of Education provided funds to track STAR pupils. 
They are in grade 9 (1994-1995) to analyze the residual benefits of an early small-class start in 
school. This continuing study, conducted by the Center of Excellence at TN State University, is 
called the Lasting Benefits Study (LBS). Plans are to follow students academically until 
graduation from high school. After graduation researchers hope to study various education, work, 
and social adjustment factors. 

In 1989-1990, using funds formerly to support STAR, the state estabhshed Project 
Challenge in 17 (by 1992-1993 only in 16) of TN's poor and educationally low-performing 
counties. The funds were used for across-tlie-board class-size reductioii in grades K-3 to about 
1:15. There was no specific research or evaluation design for Project Challenge (although the state 
did want some assessment or evaluation). All schools in the counties with grades K-3 were 
eligible to participate in Challenge. Thus, Challenge was really a limited (17 counties)— but 
inclusive in the affected districts— policy application of the positive results derived from the STAR 
experiment. 

Project STAR did find a substantial class-size effect, with small class(es) exceeding both 
regular (R) and reguJar/aide (RA) classes on all measures by some .50 to .65 standard deviation 
units (Effect Size, or ES). [Detailed results appear elsewhere; e.g., Achilles, Nye, Boyd-Zaharias, 
Fulton and Cain, 1994; Achilles, Nye, and Bain, 1994; Achilles, Nye, Boyd-Zaharias, and Fulton, 
1993; Finn and Achilles, 1990; Finn, Achilles, Bain, Folger, Johnston, Lintz and Word, 1990; 
Word, Johnston, Bain, Fulton, Boyd-Zaharias, Lintz, Achilles, Folger and Breda, 1990.] 

The LBS has been evaluated and reported at least minimally; e.g., Finn, Fulton, Boyd- 
Zaharias and Nye, 1989; Nye, Boyd-Zaharias, Fulton, Achilles, Pate-Bain, 1991, 1992, 1993, 
1994. Some added studies using the STAR database have also been reported; e.g., Finn and Cox, 
1992; Boyd-Zaharias, 1993, etc. Results of the class-size intervention have been substantial. 
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especially in controlled studies. The question now seems to be "What is the success in using class 
size as an intervention to increase pupil achievement on a wide scale . 



Project Challenge 



Initial Evaluations * 

Project Challenge began in 1989-1990, and in TN the first state- wide testings occurred at 
grade 2, using the new (begim in 1989-1990) Tennessee Comprehensive Assessment Program 
(TCAP), which is a combination of NRT and CRT measures. Thus, the Spring 1990 testing used 
grade-2 results of the new TCAP and meant that the "baseline" measure for the Challenge 
comparisons really incorporated one year (grade 2) of small-class treatment in Challenge systems. 
Researchers chose this option over trying to determine comparability of data from the Stanford 
Achievement Test (SAT) used before 1989-1990 and the TCAP used in 1989-1990 and later years. 

Based on STAR results, researchers expected that the grade-2 test results would improve as 
future cohons of students who were tested in grade 2 experienced more years of reduced classes 
(up to 3 years, including grades K, 1, and 2). The grade/test sequencing appears in Table 1 and 
shows how each subsequent year would influence the length of time a student could be in 



* 



Material is substantially the same as presented in Achilles, Nye, Boyd-Zaharias, Cain and Fulton (1994, August). 
Project Challenge Addendum . Nashville, TN: Center of Excellence for Research in Basic Skills, TN State 
University. Unpublished manuscripL 
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Challenge before the test at grade 2 (or tests at grades 3 or 4 in later years as data became 
available'. 



Table 1 About Here 



Staff at the Center of Excellence (COE) for Research in the Basic Skills at Tennessee State 
University worked with personnel at the Tennessee State Department of Education (SDE) to 
"track" and evaluate student academic progress in Challenge. Personnel at the COE used published 
data and data provided by the SDE for this purpose. Reports have been provided each year (Nye, 
Achilles, Boyd-Zaharias, Fulton, 1991, 1992, and '993) and articles discussing early results have 
appeared elsewhere (e.g., Nye et al., 1993, 1994; Achilles et al., 1993). Since no funds were 
provided for added testing, etc., the Challenge evaluation used only the reading and math scores 
that students achieved on the TCAP each year. Data for comparison were the average rank each 
year of the Challenge systems (n=17 in 1989-90, 1990-91, 1991-92, 1992-93, andn=16in 1993- 
94 and 1994-95) among the 138 Tennessee systems, so that the rank of 69 would be average. 

(For comparability, n=17 is maintained in this paper although one system has dropped from 
Challenge.) Rankings were used to suggest Challenge's progress in replicating ST. .to achieve 
higher scores through class-size reductions in the elementary grades (especially grades K-3). 

There was no attempt before 1994-1995 to verify the actital class sizes used in the Challenge 
systems. The Project Challenge reports (1990-1994) were just gross indicators of achievement test 



outcomes. 
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Refining The Challenge Database 

By 1992-1993, better test data were becoming available statewide; these data could bo used in 
Challenge reports with no added testing. Tennessee policy makers had initiated the "Sanders 
Model" as a way to track student achievement These data could be used to analyze district, 
building and even teacher successes as related to student test outcomes. In this process an 
individual student data file of annual test results on the TCAP is constantly being built and updated. 
This model is cailtxi the Tennessee Value-Added Assessment System or TVAAS, and data are now 
(1995) available for pupils in grades 2-8. (Appendix A provides a brief description of TVAAS.) 
The data can be aggregated at the classroom level, and the number (n) of pupils tested each year 
provides an indicator of class size. (This may not be the number of students regularly in class for 
instruction as some students may miss a testing.) In future Challenge reports, more detailed 
analyses of the Challenge class-size initiative will be provided by including class-size estimates 
based on the (n) lor each testing. 

Since TVAAS was a potential new database for Challenge analyses, first steps were 1) to 
develop a baseline of Challenge results using the TVAAS database for 1990-91, 1991-92, 1992- 
93, and 1993-94 to check the transition to TVAAS from the state-provided rankings used in the 
early Challenge reports and 2) to compare original Challenge results with the TVAAS database 
results. The next step will be to use the TVAAS database each year to employ detailed analyses 
that are possible with that database, if the resulr of 1 and 2 (described above) warrant the 
changeover. 

Project Challenge and TVAAS. 

The TVAAS is a complex "mixed model" statistical process that considers a large number of 
variables. Primarily, the model compares the gains by students in Tennessee to the national norm 
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gain so that a "score" of 1.0 indicates that the Tennessee student (or class or school, depending on 
a unit of analysis) made a mean gain on tlie year being analyzed that was equal to the national norm 
gain. Thus, using TVAAS data, it is i»ssible to rank not only the attained average scores (by 
system, or by school, or by classes), but also the equivalent mean gain (compared to the national 
norm gain). [When a state mean gain is computed, it is possible to rank each system in the state 
each year (at any grade level) on each system's mean gain.] 

This present analysis extends Challenge Reports (Nye et al., 1991, 1992, 1993) by using 
TVAAS data to compute system mean scores and the rank of those mean scores by grade (2, 3, 4) 
for years 1991, 1992, 1993 by two test outcomes (reading and math) on the TCAP. Details of the 
TVAAS analysis are provided in Appendices: Appendix B shows the ranks based on system mean 
TCAP scores and also the ranks based on each system's mean gains. Those data are then 
transferred to Tables 2 and 3 to show similarity using two databases. 

Tables 2 and 3 show the average ( x) rank of Challenge systems (n=17) among the 138 
Tennessee systems for reading and math for the years and grades indicated, as well as the ranks 
based on mean gain (grade 3 and grade 4 and cumulative for grades 3 plus 4) among the state's 
systems. 

Some STAR Results Rel ative to Challenge 

Greatly simplified, STAR results were greatest for the 1: 15 pupils at grades K and 1, with 
some tapering off of additional gains in grades 2 and 3. Using this as a guideline, we might expect 
Challenge (TCAP) results to be greatest once pupils who experienced their K-1 years in 1:15 
classes had reached the grade levels (2, 3, 4) where they are tested. STAR results suggest that the 
1:15 condition was primarily a preventive and not a lemedial effort and that it was a facilitative 
variable -- it should let teachers do different things to help pupils succeed than they can do in 1 :25 
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classes. This idea is borne out in efforts such as "Reading Recovery" or "Success for All" where 
improved and intense instruction or extra interventions are employed for small groups of pupUs at 
an early time m the schooling process. Thus, we might even expect improved Challenge results 
where and when the teachers begin using instructional methods and materials appropriate for 
smaller classes or groups. Additionally, the ST AR researchers suggested that as a treatment, the 
1:15 experience occurs once (when the pupil enters the 1:15 environment) and then is continued. 
That is, the 1 : 1 5 is a treatment when it first happens, and that is when (comparatively speaking) 
most changes should be evident in test results. Since Tennessee did not test in K and 1, perhaps 
evidence of major Challenge-initiated gains has been lost with no testing until grade 2. Finally, 
there is no tme baseline using TCAP, since the first year of TCAP testing at grade 2 occurred in 
1989-1990 after pupils in grade 2 had already been in 1: 15 for one year (grade 2). Appendix B 
also shows the TCAP analysis for Challenge systems using the state-supplied aggregate ranks 
from 1989-1990 through 1992-1993. 

Preliminary Analysis of TVAAS Data: Tables 2 and 3 

Given the above information, the TVAAS data in Tables 2 and 3 are instructive. For 
example, in Table 2, Grade 4 mean score ranks are consistently below the state average of 69, an 
expected event since Grade-4 pupils had very little (or no) exposure to 1 : 15 at testing in 1990, 

1991 and 1992 and especially since they had no exposure to 1:15 in grades K or 1, years of 
greatest gam for class- size effect, as seen in Project STAR. 

Data for grades 2 and 3, however, seem to show the "expected" impact of 1:15. In math 
firom 1991 to 1993 the Grade-2 ranks (rounded) go firom 71 (below state average) to 57 (above the 
average of 69); in reading the Grade-2 ranks go from 88 (below the state average of 69 by 19 
places) to 79 (or below the state average by only 10 places). 
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Data in Table 3 show that the mean ranks of Challenge systems (1991-1993) on grade 3 
TCAP scores and the cumulative ranks for grades 3 and 4 scores are at or above the TN mean, and 
that grade 4 (as expected) results are below the state average (grade 4 pupils had no 1:15 treatment 
by 1993). 



Tables 2 and 3 About Here 



Table 4 shows the mean gains for 1994 and 1995 for grades 3, 4, and 5 and the cumulative 
gains (3-5) at each testing. The drop in the rankings between grades 3 (and of 1:15) and 4 may 
reflect the "fade" that was also found in STAR. The systems, however, did not drop to pre- 
Challenge levels (see Appendix B). Since pupils were not in the treatment (1:15) one might expect 
that non-treatment scores would not reflect the same positive results as when students were m the 
1:15 conditions. This is the case. 



Prior Challenge Ranks vs. TVAAS Ranks 

A comparison of Challenge results based on TVAAS ranks and the ranks produced by the 
grouped TCAP data provided to the Challenge researchers show considerable .s imilar ity. Data for 
comparisons discussed here are ranks from Nye et al. (1993, p. 10) for the 1991-1992 testings at 
Grade 2 as shown in Table 3. These results (see Table 5) show that the TVAAS rankings are 
about 4 places (3.8 in math and 3.7 in reading) better than the prior Challenge reports using less 
precise data (Nye et al., 1993, p. 10) that were availble at the time. When the re-analyzed data 
were available (1995), the. 5 differences were 1.9 in math (59.5-57.6) and 1.7 in reading (86,9- 
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85.2), with both TVAAS rankings slightly better than the prior computation using state-supplied 
data. Based on this simUaritv Jt seems appropriate for Challenge researchers to continue to use the 
TVAAS database as it is more detailed and as prior results compared using the two databases are 
essentiallv the same. Use of the TVAAS database will help Challenge researchers improve the 
assessment of Challenge results. The TVAAS database is updated each year based upon new data, 
so there may be small changes in ranks depending upon the year that the rank was computed. (An 
example is in Tables 5 and 6.) 



Table 5 About Here 



TVAAS Information on Mean Gains 

The TVAAS information on mean gains for the Challenge systems is less clear than the 
results based on ranks, and there are no prior Challenge reports to use as comparisons for mean 
gains (Table 3). Since the mean gains are based on a gain from one year to the next and since the 
first testing on the TCAP is in Grade 2, there can be no mean gain until grade 3 (gain from Grade 2 
to Grade 3). The mean gains have more meaning and are more reliable after there are several data 
points for computation and comparison for each individual in the database. A system's ranked 
mean gain is a comparison of how well, among Tennessee's 138 systems, that system did in terms 
of achieving the national norm gain for that year. Thus, the average gain for die 17 Challenge 
systems (Table 3) of 49.7 for math and 44.2 for reading in Grade 3 (1991 testing) is for pupils 
who, by 1991, could have had 1:15 treatment for 2 years (grades 2 and 3; Table 1). These ranks 
are considerably better than the state average of 69 and show that on average, Challenge systems 
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were gaining more compared to the national norm gain than was the average ( x=69) Tennessee 
system. (But, they did have farther to go at the start!) 

Some of this improvement in ranking begins to taper off in 1992 and even more so in 1993, 
especially in math. In reading in Grade 3 the Challenge systems exceeded the state average, but 
there is a drop in Grade 4 in math. This suggests added research, such as a check into the curricula 
and the materials (especially the math manipulatives and the special programs) available in these 
poor counties to support advanced math concept mastery. These issues, and issues such as the 
teachers' preparation levels and comfort levels with advanced math, will need further exploration 
as a way to help researchers understand the declining mean gains in Challenge systems, in math in 
Grades 3 and 4. 



Table 6 shows the TVAAS grade-2 results (all pupils had the full 1-15 treatment and were 
still in 1:15 at testing) for the average rank ( x) and the mean gain for Challenge systems. Both 
ra nks and gains are at or below the state average of 69 in most cases. The difference between 
results in Tables 4 and 6 reflect the difference when the student is in 1:15 and after a pupU leaves 
1:15. Results were generally parallel findings both of STAR and of LBS. 



A review of the class sizes in Project STAR showed that as STAR progressed, some classes 
became "out of range" or there was a "bunching" of small (S) classes at the large end of (S) — e.g., 
pupil n=16, 17 or even 18 and a "bunching" of regular (R) classes at the small and of (R) - e.g., 
pupil n=22 or 23, rather than 25-27 or so. (See Appendix C.) Future use of TVAAS data can help 
Challenge researchers sort out more specifically the class-size impact by providing more precise 
data on the numbers of pupils in classes, and the tracking of pupils through the grades. 
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Conclusions 

The expanded data available (TVAAS) support the positive effects of 1:15 in the poorer 
counties in Tennessee, at least through the 1995 testings in grades 2, 3 and 4. This report did not 
add any insights into results related to actual class sizes (researchers assumed the 1:15 conditions 
generally throughout Challenge systems). The availability of TVAAS data should help researchers 
prepare more detailed reports by using class-level results compared by class size based on the 
actual number of pupils tested. Results support the changing from state-reported aggregate data to 
the TVAAS database as the differences (Grade 2, 1991-1992) in ranks using the two databases 
were less than 4 places. 

Because the TVAAS data began in 1990-1991, they do not show the considerable gains from 
1989-1990 testings in the earlier Challenge reports (e.g., Nye et al., 1993, p. 10). However, 
similarities shown between TVAAS data and prior Challenge documents could be verified if 
TVAAS data were available for the earlier years. There seems to be little reason to doubt that 
reduced class sizes (1: 15) in early primary grades (K-3) have assisted reading and math 
achievement gains in the Challenge school systems. This observed gain would be expected based 
on STAR results, and the "tapering ofT of results in grades 3 and 4 can be substantiated in part by 
both STAR and LBS results to date. 

The expanded data fiom TVAAS into grades 3, 4 and 5 and the computations available on 
test- score cumulative gains for selected grades make the TVAAS a valuable and useful way to 
analyze Challenge and to identify areas for future research. Researchers need now to analyze the 
"fade" found in Challenge and compare it to LBS results, both for the amount of the "fade" and to 
determine at which grade the "fade" is most prevalent Review of TVAAS data on Challenge 
systems will open added areas for inquiry. 
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Project Challenge is one example of state-level policy persons accepting and using 
substantive experimental research results as the basis for program decisions. Indeed, class-size 
reduction can be an expansive option. [There are indications that it is not really an expensive 
policy option if such costs from grade retention, special education placement, future remediation, 
are accounted for as well as the increased participation in schooling (Finn and Cox, 1992; Finn et 
al., 1989) and other social benefits (e.g., Weikart, 1989) are seriously taken into the finance and 
performance equation.] 

Evaluation of the Challenge initiative is providing evidence that the broad-scale policy 
implementation of research results is working well even in poor counties. Use of expanding 
databases (e.g., TVAAS) developed for other purposes than the direct evaluation of a project can 
be helpful in reducing evaluation costs and in improving the scope of low-cost policy evaluations. 
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Tables 



Table 1 



Summary of Testing for Challenge Systems f 1989-1 99Q to 



4 to Show How Long a Puuil Could Have3een in 1 : 15 at the Time of Testing 






Gmde Tested, and Years of 1:15 Participation Possible 


Year Tested 


Grade 


YKS in Llg* 


Grades in 1:15 


1990 


2 


1 


2 




3 


1 


3 




4 


0 


0 


1991 


2 


2 


1&2 




3 


2 


2&3 




4 


1 


3 


1992 


2 


3 


K«& 1 &2 




3 


3 


1&2&3 




4 


2 


2&3 


1993 


2 


3 


K&1&2 




3 


4 


K& 1 &2&3 




4 


3 


1 &2&3 


1994 and 


2 


3 


K&1&2 


later years 


3 


4 


K&1&2&3 




4 


4 


K&1&2&3 



* This presumes the maximum possible that a pupil could have experienced the 1: 15 condition by 
the time and grade of testing. Note that in Tennessee, kindergarten has not been required until 
1994. 
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Table 2 

Mean Scores (Treading and Math) of Challenge Svstems.(n=17) Ranked Jbv_Grade (2, 3. 4yfor 
1991-1993. (Tennessee had 138 systems so state x = 69) 



im 



im 



im 



( x) rank by grade ( x) rank by grade ( x) rank by grade 























Subject 


2(-.x} 


xii 


4C_^ 






4Lxi 


2Lxl 


Xh 


4Lii 


Math 


71.1 


65.2 


80.8 


55.7 


57.4 


77.8 


56.5 


65.9 


82.2 


Reading 


88.2 


88.9 


102.0 


83.2 


85.6 


91.3 


78.5 


77.3 


102.0 



Table 3 

Mean Gains (TVAAS') in Reading and Math of Challenge Systems ^=17") Ranked bv Grade (3 and 
41 and Cumulative (Grades 3 plus 41 for 1991-1993. fTennessee had 138 systems so state x = 69) 



1991 1992 1993 

( x) rank by grade ( x) rank by grade ( x) rank by grade 

Subject 3f xl 4f xl 3&4( xl 3( xl 4f xl 3&4( xl Xi) 3&4( x, 

Math 49.7 80.9 64.2 59.9 82.4 72.4 79.6 99.0 96.9 

Reading 44.2 77.4 53.7 66.0 75.4 71.9 56.9 82.6 64.5 
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Table 4 

Mean Gains (TVAASIin Reading and Math of Challenge Systems (n=171 Ranked bv GradsS-(3, 
4, 5) and Cumulative (Grades 3-5) for 1993-1995 



im im 

( x) rank by grade Cum. ( x) rank by grade Cum. 

Subject 2Lx} 4L^ 513. 3-5( x) x3 ^ 5L3 5 z5L3 

Math 78.8 91.1 74.2 91.5 83.3 87.5 85.1 93.7 

Reading 62.1 91.5 82.5 83.1 63.4 101.0 83.2 87.6 



Table 5 

Comparisons Using 1991-1992 Grade-2 Rt suits in Reading and Math of TVAAS Data and Prior 
Challenge (Nye et al.. 1993) Reports of Challenge Systems ^=171 on ( x) Ranks, and Showing 
1992-1993 TVAAS Results 



Grade 2 f xRankl 





1991-1992 fAl 


1991-1992 fBl 


Difference: 


1992-199 


Subject 


TVAAS 


Nve etal (1993) 


A-B 


TVAAS 


Math 


55.7 


59.5 


-3.8 


56.5 


Reading 


83.2 


86.9 


-3.7 


78.5 
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Table 6 

TVAAS Display (1992-1995) Grade-2 Results in Reading and Math. Challenge_S.vsteTOs (n=m 
on ( \) Ranks on the Gain Ranked (Grades 2 to 3) 



G rade 2 ( x Rank') and Gain (2-3) 





1991-92* 


1992-93* 


1993..94* 


1994-95 






Gain 




■Gm 


L2l 


Gain 




Gain 


Math 


57.6 


62.1 


55.3 


81.2 


55.8 


78.8 


54.5 


83.3 


Reading 


85.2 


66.2 


77.5 


56.3 


78.8 


62.1 


73.5 


63.4 



* Slightiy different from Table 5 data due to the constant corrections made in data and the 

analysis. Note that with the correction (1995) there is less difference (A-B) in Table 5 above for 
1991-92. 
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Tennessee Value-Added Assessment 

System 

(TVAAS) 



with 



Answers to Frequently Asked Questions 



Tennessee Value Added Assessment System 



The fundamental objective of K-12 education is to provide for 
academic growth for each student consistent with his or her innate abilities 
within a total curricular framework. To achieve this objective, appropriate 
growth must occur each academic year. If a quantitative measure of 
academic growth is available for each student, then a base is formed for an 
assessinent system which will determine progress toward the fundamental 
objective of providing sustained academic growth for each student. 

In 1990, the Tennessee Department of Education initiated the 
Tennessee Comprehensive Assessment Program (TCAP) to provide 
information on student academic progress in Tennessee. Since each 
student in Tennessee in grades 2-8 is tested annually, scores In each of five 
subjects are available as input into a system for objective assessment based 
upon measures of growth from this testing process. 



William L. Sanders’ 




Since many factors affect rates of student learning, some of which 
are outside the purview of the educational community, an effective 


and 




assessment system must be able to distinguish factors which can be 
; controlled within the educational process from other influences. The 

1 Tennessee Value-Added Assessment System (TVAAS), often referred to as 


Sandra Horn’' 


pfffVD/lcA 


the Sanders Model, was developed to provide this capability. 
What TVAAS is: 



William L. Sanders 
Professor and Station Statistician 
The University of Tennessee 
Agricuiiural Experiment Station 
Statistical & Computing Services 



Sandra Horn 
Media Specialist 
Knox County Schools 



Send queries to: Dr. William L. Sanders 

The Univers'ty of Tennessee 
- , P.O. Box 1071 

(; Knoxville, Tennessee 37901-1071 
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TVAAS is a statistical process which provides measures of the 
influence that school systems, schools, and teachers have on indicators of 
student learning. Initially, TVAAS will furnish this information on the system 
level for each school system in Tennessee for grades three through eight in 
math, science, reading, language, and social studies by using the scale 
scores from the Tennessee Comprehensive Assessment Program (TCAP). 
TVAAS will be extended to cover grades nine through twelve when subject 
matter specific tests that can provide comparable data for these grades have 
been developed and validated. TVAAS is mandated by the Educational 
Improvement Act which took effect July 1, 1992. 
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TVAAS is based upon work completed by Dr. William L. Sanders and 
Dr, Robert A. McLean using data from second thorough sixth grade students 
from three systems: Knox County, Blount County, and Chattanooga City. 
Their studies, based upon 65,000+ student records, yielded six primary 
liiidincjs: 
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1. There were measurable differences among schools and teachers 
with regard to their effect on indicators of student learning. 

2. The estimates of school and teacher effects tended to be consistent 
from one year to the next. 

3. Teacher effects were not site specific, i.e., a gain score could not be 
predicted by simply knowing the location of the school. 

4. Student gains were not related to the ability or achievement levels 
of the students when they entered the classroom. 

5. The estimate of school effects was not related to the racial 
composition of the student body. 

6. There was very strong correlation between teacher effects as 
determined by the data and subjective evaluations by principals and 
supervisors. 

Rigorous statistical theory underpins the TVAAS model. Sanders and 
McLean demonstrated that Henderson’s mixed-model methodology (for an 
introduction to this methodology see McLean, Sanders and Stroup, 1991), 
when applied in the context of educational outcome assessment, would 
eliminate most of the statistical problems previously identified as 
impediments to the use of student achievement data as part of an 
assessment process (McLean & Sanders, 1984). Thus, by basing TVAAS on 
statistical mixed model methodology, unbiased estimates of the influence of 
teachers, schools, and school systems on student learning rates can be 
obtained, even when extreme differences exist in students' environments and 
in students’ assignments to teachers. The robustness of the TVAAS model 
has oeen confirmed using computer simulations to evaluate "worst case 
scenarios". 

How it Works: 

TVAAS analyses the scale scores students make over a period of 
three to five years on the norm-referenced Items on the TCAP. Unlike 
stanines or percentiles that are used to rank students against their peers, the 
scale scores indicate a student’s cuncnt level of attainment in a subject. 
Whereas stanines and percentiles tend to remain relatively constant, scale 
scores are designed to increase from year to year as the student learns. 

The pattern of the scale scores over the child’s school career forms 
a profile of academic growth. Regardless of the level at which students 
enter tlie classroom, if they make progress, their academic gains will be 
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reflected in increased scale scores. By statistically aggregating the "dimples" . 
and "bubbles" in these curves over a population of students, the influence of 
school systems, schools, and teachers can be fairly estimated. To achieve 
this, the solutions to tens of thousands of simultaneous equations Is usually 
necessary. As an integral component of TVAAS, a software system to 
accomplish this enormous computing task has been developed. 

A data base containing the merged records of all students in 
Tennessee who have taken the TCAP tests during the past three years has 
been constructed. At present (1992), It contains more than 1.6 million 
student records. This number will continue to grow over time and will 
enable continued tracking of the academic growth of each student. 

The Educational Improvement Act (EIA) mandates that school 
system effects on the educational progress of students for grades three 
through eight, as determined through the use of TVAAS, will be reported for 
systems state-wide no later than April 1, 1993. This report will be available 
to the public and will be updated annually. 

The EIA sets July 1, 1994 as the deadline for issuing the first set of 
reports on individual school effects. This set of reports will also be available 
to the public and will be revised on a yearly basis. 

The individual teacher effects for teachers of grades three through 
eight are to be reported to the teacher, appropriate administrators, and 
school board members no later than July 1, 1995, according to the EIA. 
These reports relating to the influence of individual teachers on the rate of 
student learning will not be available to the public. Reports on all levels will 
be based on at least three years of data and no more than five years of data. 

The following are some of the questions that educators ask 
about TVAAS: 



There are so many things going on in my students’ lives, some of them 
traumatic. How much influence do I have on their progress, anyway? 

The child you receive this year tends to have the same learning 
ability, the same environment, the same emotional stability as s/he had in 
the past unless something traumatic happens -- drugs, serious illness, 
divorce of the parents, and so forth. When such a trauma occurs, students’ 
learning curves can change dramatically. A "learning curve" does not refer 
to a smooth, elegant line under the best of circumstances. Rather, it is a line 
marked with "bubbles" and "dents”, denoting a child’s variable progress 
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through academic life. If recovery takes place, this curve will return to its 
previous trajectory, with a dent marking the troubled time In the child’s life. 
If, however, racuperation does not occur, the reference curve Itself changes. 
In other words, a very persona! experience is reflected in the statistical 
profile of the child. What this means to a teacher is that the child’s new 
learning profile will be the base from which that child’s contribution to the 
over-all effect will be determined. Therefore, an individual teacher or school 
does not have to be concerned about potential bias in the estimated effects 
caused by an abrupt change in the growth pattern of individual students. 

If I have a class of third graders who come to me reading on a first 
grade level, how can I possibly show up well on an assessment, no 
matter what I do? 

TVAAS is especially sensitive to what level a student has achieved 
in a subject area when you see the child for the first time. If your students 
make progress, they will show a positive gain in their scores on the TCAP 
norm-referenced items. Whether the scores reflect a growth from a sixth to 
a seventh grade level or from a second to a third grade level, it is still a 
positive gain and it shows that your teaching has been effective. 

Won’t students who have less ability make smaller gains than bright 
students? Do you expect a child with an IQ of 80 to gain as much as a 
student whose IQ is 120? 

TVAAS determines the gain each group of students can be expected 
to obtain by considering their prior history of achievement. Thus, if children 
are taught in a manner consistent witfi their current level of attainment, then 
appropriate gains are achievable. 



I serve a transient population, mostly children of military personnel. 
How can my teaching be assessed if half my students enter or leave 
sometime during the school year? 



Only scores of students who have been present in your class for at 
least 150 days of the school year will be used by TVAAS. The attendance 
figure will be the attendance of the child for the year and will be entered at 
the end of the school year, even though the child will probably be tested 
sometime before the 150th day of the school year. 



The question of transient students was one of the very first 
problems that the developers of TVAAS addressed. Because of the ways in 
which teacher effects and student scores interrelate with one another, it is 
possible to take advantage of llie "sliinyling'’ plienomenon. A explanatiun of 
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"shingling" is that the data of students a teacher has instructed in the past 
overlap with that of students the teacher is now teaching. In addition, data , 
gathered on students before they come to a given teacher and the 
performance of this teacher’s students under subsequent teachers furnishes 
a detailed picture of the students’ progress, even if a large proportion of the 
students move in or out of the system in any given year. Thus, by utilizing 
the overlap the whole "roof" can be covered. 

Why are you using the norm-referenced questions to gather the scores 
for TVAAS? Some of those questions are much too hard for my 
students. Wouldn’t it be better to use the criterion-referenced items? 

Norm-referenced items cover a range from substantially below the 
grade level on which the students are being tested to substantially above 
that level. Your students aren’t supposed to know the answers to all of the 
questions. This is why these scores can assess gains even for students 
functioning above or below the grade to which they are assigned. 

The norm-referenced questions on the TCAP have been validated 
against a national sample of children in order to determine what a child on 
a certain grade level Is expected to know. The gains that are normal from 
year to year have also been validated on a national sample. The developers 
of TVAAS have found that, generally, Tennessee’s students achieve scores 
and gains that closely approximate the national averages. 

On the other hand, criterion-referenced items can only indicate 
whether or not a student has learned a specific piece of information. This 
is important information, but it doesn’t reveal knowledge about the gains a 
student has made from one year to the next. Assume that you teach fourth 
grade, and you have a student who is two years behind in math. You may 
teach this student a great deal in a year’s time, and yet s/he may still not be 
able lo answer the criterion-referenced questions for the fourth grade. The 
i.nprovement this student has made could not be detected by 
criterion-referenced items, but progress would be quite evident in 
performance on the norm-referenced items. 

My students are mostly from the inner city. Won’t that make a 
difference in their gain scores? 

The pilot studies revealed no relationship between the racial 
composition of student body and gain scores. Whether a school was an 
inner city school or a suburban one was also found to be unrelated to the 
gains students made. 
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Subsequent analysis of data from the TCAP data base does indicate 
that measurable differences in mean gaino do exist among school systems 
and among schools within school systems. At most, only a small portion of 
these differences can be attributed to socio-economic factors. 

I am an art teacher. How will I be assessed? What about librarians? 
guidance counselors? physical education teachers? 

Any subject with a curriculum to which scale scores can be applied 
could be evaluated by the TVAAS. However, there are no plans at present 
to develop tests for these special areas in grades 3 through 8. 

I have honor students all day. Their scores are already very high. How 
can you tell whether I’ve made a difference to them? 

Analysis of scale scores indicate that there is enough stretch in the 
test to accommodate our top students. Even among gifted learners, 
perfection is extremely rare. When it does occur, its effect on school and 
teacher gains is trivial. 

If my effectiveness is going to be judged by how well my students do 
on the TCAP, it really makes more sense to concentrate on teaching to 
the test than on trying to cover the whole subject, doesn’t it? 

The items which will be used by TVAAS must be "fresh, 
non-redundant equivalent tests, replaced each year"[EIA, section 4(g)(7)|. 
Each TCAP test represents a carefully constructed “sample" of items over a 
broad domain of possible items within each discipline. Since most of the 
items will be new each year, it will be extremely difficult to predict what the 
specific items will be for any given year. Thus, teachers measured to be 
most effective will be those who teach subjects holistically rather than 
teachers who concentrate on Isolated facts and skills that have been tested 
for in the past. Teaching integrated subject matter is consistent with 
research on how students learn best and is, therefore, also consistent with 
good test scores. 

What If a teacher gets a copy of the test and teaches it to fifth grade 
students? Won't that mean that student gains in the sixth grade won't 
look very good-and neither will their teacher? 

First of all. test security is of great importance, and the legislature 
was fully aware of this fact. The EIA sets forth the following sanctions in 
section 4(g)(9); 
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Any person found to have not followed security guidelines 
for administration of the TCAP test, or a successor test. 
Including making or distributing unauthorized copies of the 
test, altering a grade or answer sheet, providing copies of 
answers or test questions, or otherwise compromising the 
integrity of the testing process shall be placed on immediate 
suspension and such actions will be grounds for dismissal, 
including dismissal of tenured employees. Such actions 
shall be grounds for revocation of state license. 

Furthermore, it is possible through analysis of the data to recognize 
specific situations in which the test has been compromised. TVAAS is 
designed to "kick out" suspicious data for further examination. Additionally, 
the statistical processes which undergird TVAAS assure that a specific effect 
wiil not be unduly influenced by undetected inappropriate prior behavior. 
This has been confirmed by computer simuiation which documents the 
robustness of TVAAS. 

How can an assessment system that’s based on test scores encourage 
innovation in the classroom? 

TVAAS was conceived as a method of estimating the academic 
growth of each student over his or her school career in each subject. It 
does not suggest or prescribe a particular method for encouraging this 
growth. How you help your students learn is your decision. Typically, 
students perform well on standardized tests whenever good teachers, day 
after day, promote scholarship and make sound instructional decisions. 
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Appendix B 

Summary of Change in Ranks, 1989-1990 to 1992-1993, for Reading and Math. State of 
Tennessee Second-Grade TCAP Results for 17 Challenge Systems Using the State-Provided 
Aggregate Data (1989-1992) and TVAAS (1993) 



Tennessee 
Challenge 
Systems (N=17) 


TCAP Scores: Grade 2 


Reading (TOT) 


Mathematics (TOT) 


89-90 90-91 91-92 92-93 


89-90 90-91 91-92 92-93 


Sum of Ranks 


1681 1591 1477 1335 


1148 1336 1011 961 


-17 


98.9 93.6 86.9 78.5 


85.2 78.6 59.5 56.5 


Differences 90-91 






Gain in Rank 


+90 


+112 


Average Gain 


+5.3 


+6.6 


Differences 91-92 






Gain in Rank 


+114 


+325 


Average Gain 


+6.7 


+19.1 


Differences 92-93 






Gain in Rank 


+142 


+50 


Average Gain 


+8.4 


+2.9 


Differences 90-93 






Gain in Rank 


+346 


+487 


Average Gain 


+20.4 


+28.6 



Note: State has 138 districts. Average rank is approximately 68. A Grade-One analysis (1992) 
shows both reading and math above die State average (56.8 and 62.8). Later analyses will be 
conducted on different grade levels and by using various sub-tests of TCAP. 
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Appendix C 

Distribution of STAR classes by grade (K-3) by designation S (Small), R (Regular), and RA 
(Regular and Aide). 





K (n classes) 


1 (n classes') 


2 fn classes) 


3 (n classes) 




S 


R 


RA 


S 


R 


RA 


S 


R 


RA 


S 


R 


RA 


11 




















2 






12 


8 






2 






3 






2 






13 


19 






14 






16 






15 






A 14 


22 






18 






27 






17 






15 


23 




1 


31 






32 






31 






16 


31 


1 




16 


1 




29 


1 




31 




1 


17 


24 


4 


1 


33 


1 




19 






27 






18 




1 


2 


6 


2 




6 






10 


1 




B 19 




7 


6 


3 


4 


3 


1 


3 


3 


5 




4 


20 




6 


6 


1 


10 


6 




2 


1 




9 


13 


21 




14 


12 




18 


18 




7 


11 




11 


12 


22 




20 


20 




27 


15 




23 


21 




13 


16 


23 




16 


21 




19 


20 




20 


21 




10 


14 


24 




19 


14 




16 


11 




22 


25 




15 


14 


25 




6 


6 




7 


9 




9 


15 




16 


15 


C26 




4 


3 




5 


9 




6 


7 




5 


12 


27 




1 


6 




2 


4 




4 


1 




5 


8 


28 






1 




1 


2 




1 


0 




2 


6 


29 










1 


2 




2 


2 




2 


2 


30 










1 


1 














Total 


127 


99 


99 


124 


15 


100 


133 


100 


107 


140 


90 


107 




325 


339 


340 


337 



A = range for (S); B = "out of range"; C = range for both (R) and (RA) classes. 




O 

ERIC 



